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Abstract 

Background: Serum protein profiles have been investigated frequently to discover early biomarkers for breast 
cancer. So far, these studies used biological samples collected ot or after diagnosis. This may limit these studies' 
value in the search for cancer biomarkers because of the often advanced tumor stage, and consequently risk of 
reverse causality. We present for the first time pre-diagnostic serum protein profiles in relation to breast cancer, 
using the Prospect-EPIC (European Prospective Investigation into Cancer and nutrition) cohort. 

Methods: In a nested case-control design we compared 68 women diagnosed with breast cancer within three 
years after enrollment, with 68 matched controls for differences in serum protein profiles. All samples were 
analyzed with SELDI-TOF MS (surface enhanced laser desorption/ionization time-of-f light mass spectrometry). In a 
subset of 20 case-control pairs, the serum proteome was identified and relatively quantified using isobaric Tags for 
Relative and Absolute Quantification (iTRAQ) and online two-dimensional nano-liquid chromatography coupled 
with tandem MS (2D-nanoLC-MS/MS). 

Results: Two SELDI-TOF MS peaks with m/z 3323 and 8939, which probably represent doubly charged 
apolipoprotein C-l and C3a des-arginine anaphylatoxin (C3a desArg ), were higher in pre-diagnostic breast cancer 
serum (p = 0.02 and p = 0.06, respectively). With 2D-nanoLC-MS/MS, afamin, apolipoprotein E and isoform 1 of 
inter-alpha trypsin inhibitor heavy chain H4 (ITIH4) were found to be higher in pre-diagnostic breast cancer (p < 
0.05), while alpha-2-macroglobulin and ceruloplasmin were lower (p < 0.05). C3a desArg and ITIH4 have previously 
been related to the presence of symptomatic and/or mammographically detectable breast cancer. 

Conclusions: We show that serum protein profiles are already altered up to three years before breast cancer 
detection. 
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Background 

Early diagnosis of breast cancer by mammography is 
one of the most important factors contributing to the 
successful treatment of breast cancer. Further improve- 
ment of early diagnosis might be possible with the use 
of blood-based biomarkers. Such markers could indicate 
the presence of a breast tumour already in an early 
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stage, preferably even before the lesion is visual on a 
mammogram. This would be particularly relevant for 
young women for whom mammographic screening is 
less effective due to lower sensitivity (25 to 59%) [1]. 
Although the addition of magnetic resonance imaging 
(MRI) to mammography could improve sensitivity [1], a 
blood test would be less expensive and easier to perform 
on a large scale. 

Many studies have been executed in an attempt to 
find such early breast cancer biomarkers, for example 
using surface enhanced laser desorption/ionization 
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time-of-flight mass spectrometry (SELDI-TOF MS) 
[2-9]. Several proteins in the blood were indeed found 
to be related to the presence of breast cancer [2-9]. 
However, only few of these proteins were reported to 
be discriminative for breast cancer in more than one 
study, and even then, some proteins found to be 
higher in patients in one study, were found to be 
lower in another study [2-9]. These discrepancies may 
be caused by differences between cases and controls in 
collection, processing and storage of their blood sam- 
ples, both within and between studies [10-16]. On the 
other hand, it cannot be excluded that findings were 
simply due to chance. 

Until now, all studies, except one by Pitteri et al. 
[17], used biological samples collected at or after diag- 
nosis of breast cancer, and thus findings may reflect 
consequences rather than predictors of malignancy. 
Thus, it remains unclear whether these proteins are 
able to identify women with a breast lesion which is 
not yet visible on a mammogram and does not induce 
clinical symptoms yet. Pitteri et al. [17] previously 
investigated plasma samples prospectively collected in 
the Women's Health Initiative Observational Study. 
Epidermal Growth Factor Receptor (EGFR) was found 
to be increased in plasma samples collected 17 months 
before breast cancer diagnosis. In the present study we 
performed serum protein profiling of breast cancer 
samples for the first time in a nested case-control 
study. For this we used the Prospect-EPIC (European 
Prospective Investigation into Cancer and nutrition) 
cohort [18], where at study enrollment blood samples 
of approximately 17,000 healthy women were collected 
and stored. For the current study we selected those 
women who were diagnosed with breast cancer within 
3 years after enrollment in the cohort. Pre-diagnostic 
protein profiles of their serum samples, taken at 
enrollment, were compared to those of matched con- 
trols who remained healthy. 

Our first aim was to assess whether previously 
reported proteins are also discriminative in serum sam- 
ples taken up to three years before breast cancer diagno- 
sis. We also set out to discover new discriminating 
proteins. To this end, we used SELDI-TOF MS that has 
the possibility to measure multiple proteins simulta- 
neously in a high-throughput fashion. Next, in a subset 
of the case-control pairs, we analyzed the serum protein 
profiles with isobaric Tags for Relative and Absolute 
Quantification (iTRAQ) -labeling, and two-dimensional 
online nano-liquid chromatography coupled with tan- 
dem mass spectrometry (2D-nanoLC-MS/MS), by which 
the detected proteins are relatively quantified and imme- 
diately identified. SELDI-TOF MS and 2D-nanoLC-MS/ 
MS cover different mass ranges and therefore are able 
to detect different proteins. 



In summary, we set out to find new proteins as well as 
to test previously detected proteins in patients still free 
of symptomatic breast cancer. 

Methods 

Study population 

We performed a case-control study nested within the 
Prospect-EPIC cohort. Prospect-EPIC is one of the two 
Dutch cohorts participating in the European Prospective 
Investigation into Cancer and nutrition, which includes 
ten European countries. From 1993 to 1997, 17,357 
women from Utrecht and vicinity, aged 50-69 years, 
enrolled in this cohort through the national population- 
based breast cancer screening program [18]. Women 
filled out an extensive food frequency questionnaire and 
a general questionnaire. The latter contained questions 
on demographic characteristics, medical history, lifestyle 
characteristics, and risk factors for cancer and other 
chronic diseases [18,19]. 

Prospect-EPIC participants also donated a blood sam- 
ple. Blood collection, processing and storage were per- 
formed following a strict protocol. After collection, 
blood samples were stored in a climate controlled refrig- 
erator at 5°C overnight. The next day blood samples 
were centrifuged at 1500 g for 20 minutes. After centri- 
fuging, the serum was put in 0.5 ml straws. These straws 
were stored in a -86°C freezer until they were trans- 
ported to liquid nitrogen tanks (-196°C), where they 
have been stored since then. 

Participants were followed for vital and health status. 
Information on dates of death and migration was 
obtained through the municipal registries. Causes of 
death were obtained from the Central Bureau for Statis- 
tics (CBS). Through yearly linkage with the regional and 
national cancer registries information about cancer inci- 
dence and stage of disease at diagnosis (tumor behavior, 
tumor size, lymph node involvement and metastasis) 
was obtained [18]. Until December 31 st 2006, 687 
women were diagnosed with breast cancer in the Pro- 
spect-EPIC cohort. All participants signed an informed 
consent and the study was approved by the Institutional 
Review Board of the University Medical Center Utrecht. 

For the current study we selected women who were 
diagnosed with breast cancer within three years after 
enrollment into the cohort, and who were postmeno- 
pausal at enrollment (no menstrual periods in last 12 
months). Women were excluded if they had had cancer 
before, were suffering from diabetes, were current smo- 
kers, or were currently using oral contraceptives, or 
menopausal hormone therapy (HT). This was done to 
obtain a homogeneous group with respect to hormone 
levels, smoking status, and metabolic status, because 
these factors (may) influence serum protein profiles 
[20]. Sixty-eight women were eventually included as a 
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case. Controls were participants of the same cohort. We 
matched each case with one postmenopausal control 
that remained free of breast cancer up to the time the 
case was diagnosed. Additional matching factors were 
age at enrollment (± 1 year) and date of enrollment (± 
1/2 year). For controls the same exclusion criteria were 
applied as for cases. Differences between cases and con- 
trols, and between samples of cases and controls, were 
tested with independent samples T test for normally dis- 
tributed continuous variables, with Mann-Whitney U 
test for other continuous variables, and Pearson Chi- 
Square for categorical variables. 

SELDI-TOF MS analysis 

We performed serum protein profiling on immobilized 
metal affinity capture (IMAC30) ProteinChip arrays 
(Bio-Rad Labs, Hercules, Ca, USA) activated with nickel 
as described in our previous study [9]. The total sample 
set was analyzed in duplicate, in three separate batches, 
within two weeks time. Duplicates were analyzed within 
the same batch, but on different arrays, to correct for 
inter-array variability. Cases and controls were evenly, 
and randomly, distributed over the three batches. Sam- 
ples in one batch were prepared and applied to the 
arrays, followed by detection of the proteins bound to 
the arrays with SELDI-TOF MS, on the same day. 
SELDI-TOF MS was performed using the PBS-IIC Pro- 
teinChip Reader (Bio-Rad Labs). See Additional file 1 
for settings of the ProteinChip Reader. 

Since analyzing samples in different batches, on differ- 
ent days, introduces inter-batch variation [16,21,22], 
spectra were processed per batch. For this, we used the 
ProteinChip Software package, version 3.1 (Bio-Rad 
Labs). Spectra in which normalization revealed too low 
or too high total ion current were excluded from further 
analysis. The cases and controls matched with these 
subjects were also excluded from the paired analyses. 
Subsequently, the Biomarker Wizard (BMW) software 
application (Bio-Rad Labs) was used to detect peaks. 
This was performed in each batch separately. See Addi- 
tional file 1 for way of processing the spectra and for 
the settings for peak detection. 

SELDI-TOF MS data analysis 

Peak information from all acquired spectra was exported 
from the ProteinChip Software to SPSS 15.0 for statisti- 
cal analysis. First, we estimated the reproducibility of 
the duplicates, by calculating the median coefficient of 
variance (CV) for each detected peak, in cases and con- 
trols together. The averaged intensities of the peaks 
with the same mass in the duplicate spectra of a subject 
were used for further analysis. To be able to merge peak 
intensity data of the three batches, averaged peak inten- 
sities were first Z-log- transformed per batch [23]. 



Paired samples T tests were used to test if the mean 
Z-log-transformed peak intensities in the pre-diagnostic 
breast cancer serum samples were statistically signifi- 
cantly different from those in the controls samples. We 
performed correction for multiple testing, using the 
False Discovery Rate (FDR) method suggested by Benja- 
min and Hochberg. The FDR controls the expected pro- 
portion of falsely rejected hypothesis [24]. We chose 
10% as an acceptable proportion of false positive results 
(q-value = 0.10). We also investigated whether any sig- 
nificant relation could be explained by any of the subject 
characteristics other than breast cancer status. To this 
end, bivariate conditional logistic regression analyses 
were performed including the peak intensity (continu- 
ous) and one of the following characteristics: Body Mass 
Index (BMI), former use of oral contraceptives, former 
use of HT, number of children, smoking habits, alcohol 
consumption, blood sample's time in refrigerator 
between blood collection and centrifugation, and sam- 
ple's time in -86°C freezer until storage at liquid nitro- 
gen. The adjusted odds ratios (OR) resulting from the 
analyses were compared with the crude breast cancer 
OR in relation to peak intensity. To test whether the 
intensities of peaks that differed between cases and con- 
trols, also differed between cases that were more close 
to diagnosis, and cases that were less close to diagnosis 
at moment of sample collection, we performed indepen- 
dent sample T tests. To this end, cases who were diag- 
nosed based on the first mammogram after enrollment 
were compared to cases who had a negative first mam- 
mogram and who were diagnosed at a later moment. 

Sample preparation for 2D-nanoLC-MS/MS 

We restricted the 2D-nanoLC-MS/MS analysis to 20 
case-control pairs, because of costs and time restric- 
tions. The cases included in this sub-analysis were diag- 
nosed with breast cancer within the first 14 months 
after enrollment in the study. 

The serum samples were depleted of the high abun- 
dant proteins albumin, IgG, antitrypsin, IgA, transferrin 
and haptoglobin, using the Multiple Affinity Removal 
Spin Cartridge (Hu-6HC, Agilent Technologies, Santa 
Clara, CA, USA) as described in the manufacturer's pro- 
tocol. Thereafter the samples were desalted using 
Microcon Centrifugal Filter units (Millipore, Billerica, 
MA. USA). The total protein content of the depleted 
sera was determined using a protein assay kit (BCA™, 
Pierce, Thermo Scientific, Rockfort, IL, USA). The pro- 
teins (50 [ig per sample) were reduced using tris(2-car- 
boxyethyl)phosphine, alkylated using iodoacetamide and 
then trypsin digested (Roche Diagnostics Gmbh, Man- 
nhein, Germany) overnight and evaporated to dryness 
using a SpeedVac. Peptides were labeled with 4-plex 
iTRAQ reagents (iTRAQ reagent kit-plasma, Applied 
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Biosystems, Foster City, CA, USA) according to the 
instructions of the manufacturer. 

Two case-control pairs were labeled with different iso- 
baric tags in each iTRAQ-labeling set. The first case was 
labeled with tagll4 and the matching control with 
tagll5, the next case was labeled with tagll6 and the 
matching control with tagll7. The 4 labeled samples 
were finally pooled into a new sample tube. A total of 
10 iTRAQ-labeled sample sets consisting of two case- 
control pairs were generated. 

2D-nanoLC-MS/MS analysis 

The 10 iTRAQ-labeled sample sets were analyzed using 
quadrupole-time-of-flight mass spectrometer (QSTAR 
pulsar; Applied Biosystems), equipped with a nanoelec- 
trospray source (Proxeon, Odense, Denmark), and con- 
nected to a 2D-nanoLC system equipped with a 
capillary and nano pump (1100 series; Agilent Technol- 
ogies). See Additional file 2 for details about the used 
columns and mobile phases. The LC system was 
coupled on-line to a fused-silica PicoTip (50 (im i.d. x 
360 (im o.d. x 8 (im tip; New Objective, Woburn, MA, 
USA). Details about acquisition and calibration are also 
described in Additional file 2. 

2D-nanoLC-MS/MS data analysis 

Protein identifications and quantifications were per- 
formed using Protein Pilot 1.0 (Applied Biosystems) in 
which the paragon search algorithm was applied. Pro- 
teins were searched against the IPI human protein data- 
base (IPI human v3.40) downloaded from http://www. 
ebi.ac.uk[25]. See Additional file 3 for details on search 
parameters and data processing. 

In some runs, some peptides were unusable for quan- 
tification due to an artificial low signal of the signature 
ions or because the peptide sequence was shared by 
other proteins. In those cases the peptides were 
excluded from quantification. No iTRAQ ratio was cal- 
culated if there was not one usable peptide left. If only 
one peptide was usable for quantification of a protein 
then no error factor (EF) was calculated. A case-control 
pair was excluded when no ratio and/or EF could be 
calculated for this pair. Only proteins that could be 
measured in at least 14 of the 20 case-control pairs 
were selected for further analysis. 

The ratios and the EFs for a protein, in the different 
pairs, were used to model a random effect model. We 
used the random effect model since we assumed hetero- 
geneity between the ratios of the different pairs that is 
partly based on variation by coincidence, but also on 
true variation between the pairs. The random effect 
model resulted in a weighted mean ratio with a 95% 
confidence interval (95%CI) for every protein. We also 
applied correction for multiple testing using de FDR 



method on these results. We again choose 10% as an 
acceptable proportion of false positive results. 

Results 

Study population 

Characteristics of the total study population are pre- 
sented in Table 1. About half of both cases and controls 
used oral contraceptives in the past, but the cases used 
them for a longer period of time than the controls 
(median: 10 years and 4.5 years, respectively; p-value 
0.018). Cases were somewhat more often nulliparous 
(15%) than controls (7%), and among women with chil- 
dren, controls had more children than the cases; 3 and 
2 (median), respectively, although not statistically signifi- 
cantly. About half of both cases and controls had 
smoked in the past, for about 8 and 4 pack-years (med- 
ian), respectively (p = 0.187). Characteristics of the 
serum samples and the sample collection are listed in 
Table 2. There was no difference between cases and 
controls regarding sample collection and storage. Char- 
acteristics of the subjects in the subset (analyzed by 2D- 
nanoLC-MS/MS), and of their serum samples, are 
shown in Additional file 4 and 5. 

Breast cancer was diagnosed after a median time of 
21.3 months (inter-quartile range (IQR): 0.7-26.6) after 
enrollment. More than 80% of the cases had an invasive 
tumor. More than half of the invasive tumors were diag- 
nosed in Stage I and a quarter of the invasive tumors 
were diagnosed in Stage IIA. Only one tumor was diag- 
nosed in Stage IIIA. The invasive tumors were more or 
less equally distributed over the three size categories 
(>0.1-1 cm, 1-2 cm and >2 cm). In almost 30% of the 
invasive tumors, lymph nodes were involved. None of 
the cases was diagnosed with distant metastasis. We 
reported the pathologically determined tumor size and 
lymph node involvement unless this was unknown; in 
that case we reported the clinically determined stage. 
Cases in the subset analyzed by 2D-nanoLC-MS/MS 
were diagnosed 0.9 months (median) (IQR: 0.6-7.5) after 
enrollment. Two of the 20 cases were diagnosed with 
carcinoma in situ. Two thirds of the invasive tumors 
were diagnosed in Stage I and almost a quarter in Stage 
IIA, the remaining tumors were diagnosed in Stage IIB. 
Half of the invasive tumors were sized <1 cm, and in 
only three invasive tumors lymph nodes were involved. 

Peaks detected with SELDI-TOF MS 

After normalization, 25 of the 272 spectra (68 cases and 
68 controls in duplicate) had to be eliminated from the 
analysis. These outliers included 12 spectra of cases and 
13 spectra of controls. Of one case and two controls 
both spectra (duplicates) had to be eliminated. With the 
BMW software application, in total 47 different peaks 
were auto-detected in the three batches. Twenty-two of 
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Table 1 Study population characteristics 



Cases 
(n = 68) 



Controls 

(n = 68) 



P-value ++ 



Age at enrollment (years) 

Mean (SD) 

BMI 

Mean (SD) 
Missing 

Use of oral contraceptives, n (%) 

No, but used to in the past 
No, never 

Duration of oral contraceptives use* (years) 

Median (IQR) 
Use of HT, n (%) 

No, but used to in the past 

No, never 
Duration of HT use* (years) 

Median (IQR) 
Ovariectomy, n (%) 

Both ovaries removed 

Missing 
Parity, n (%) 

Nulliparous 
Number of children 1 

Median (IQR) 
Smoking, n (%) 

No, but used to in the past 

No, never 

Pack-years smoking until stop date* 

Median (IQR) 

Missing 
Alcohol intake (g/day) § 

Median (IQR) 
Use of medicines, minerals or vitamins*, n (%) 

Yes 

No 

Time since last meal and/or drink** (min) 
Median (IQR) 



60.2 (5.6) 

26.6 (3.1) 
1 

36 (52.9) 
32 (47.1) 

10.0 (4.3-15.8) 

7 (10.3) 
61 (89.7) 

1.0 (1.0-8.0) 

5 (7.4) 

10 (14.7) 
2.0 (2.0-3.0) 

31 (45.6) 

37 (54.4) 

7.9 (1.9-16.4) 

1 

2.0 (0.2-7.2) 

46 (67.6) 
22 (32.4) 

108 (87-137) 



60.3 (5.7) 
26.3 (3.6) 



40 (58.8) 
28 (41.2) 

4.5 (2.0-10.0) 

6 (8.8) 
62 (91.2) 

2.0 (1.0-10.0) 

3 (4.5) 
1 

5 (7.4) 

3.0 (2.0-3.0) 

34 (50.0) 
34 (50.0) 

4.1 (1.4-10.2) 

3 

2.5 (0.2-8.4) 

44 (64.7) 
24 (35.3) 

116 (88-137) 



0.966 
0.603 

0.490 

0.018 
0.771 

0.273 
0.479 

0.171 
0.100 
0.607 

0.187 

0.638 

0.717 
0.651 



SD: standard deviation; BMI: body mass index; IQR: inter-quartile range; HT: menopausal hormone therapy; * Among former oral contraceptives/HT users; + 
Among women with children; * Among former smokers; § Energy-adjusted alcohol intake at enrollment; # In last week before blood collection; ** At moment of 
blood collection; ++ Independent samples T test for age and BMI, Mann-Whitney U test for other continuous variables, and Pearson Chi-Square test for categorical 
variables. 



these peaks were present with an S/N >2 in at least 50% 
of the spectra in each batch. The median CV's of these 
peaks varied between 12% and 35%. 

The intensity of a peak with mass-to-charge ratio (m/ 
z) 3323 was statistically significantly higher in pre-diag- 
nostic breast cancer serum samples than in serum sam- 
ples of controls (p = 0.02). The intensity of a peak with 
m/z 8938 was borderline statistically significantly higher 
in cases than in controls (p = 0.06) (Figure 1). No statis- 
tically significant relations were found between the 
intensities of the other detected peaks and the early 



presence of breast cancer. Correction for multiple test- 
ing revealed that none of the detected peaks had less 
than 10% chance to be a false positive finding. The 22 
detected peaks ordered by their m/z, together with their 
mean Z-log-transformed peak intensities in cases and 
controls, and the results of the paired T test are listed 
in Table 3. 

Bivariate conditional logistic regression analysis 
revealed that the relations between m/z 3323 and breast 
cancer, and m/z 8938 and breast cancer, were indepen- 
dent of BMI, oral contraceptives use, HT use, number 
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Table 2 Characteristics of the serum samples 

Cases Controls P- 
(n = 68) {n = 68) value 

Serum sample storage duration* 
(years) 

Mean (SD) 11.2(1.1) 11.2(1.1) 0.900 § 

Hours in refrigerator + 

Median (IQR) 22(21- 22 (20- 0.845* 

23) 23) 

Days at -86°C* 

Median (IQR) 8 (6-11) 7 (5-1 1) 0.429* 

SD: standard deviation; IQR: inter-quartile range; * Until date of experiment; + 
Between collection and centrifugation; * Between centrifugation and storage 
at liquid nitrogen; § Independent samples T test; # Mann-Whitney U test. 

of children, smoking habits, alcohol intake, duration of 
blood sample in refrigerator between collection and cen- 
trifugation, or serum sample storage duration at -86°C 
before storage at liquid nitrogen (data not shown). 



Twenty-three cases were diagnosed based on the first 
screening after enrollment, 43 cases had a negative 
mammogram at first screening and were diagnosed at a 
later moment. The mean Z-log-transformed intensity of 
m/z 3323 was not different between the early breast 
cancer cases and the very early breast cancer cases (0.22 
(SD:0.96) and 0.21 (SD:1.00), respectively; p = 0,99). The 
mean Z-log-transformed intensity of m/z 8938 was 
somewhat higher in the early breast cancer cases, com- 
pared to the very early breast cancer cases, although not 
statistically significantly (0.23 (SD:0.86) and 0.16 
(SD:L11), respectively; p = 0.79). 

Identities of the SELDI-TOF MS peaks 

Based on results of a previous study performed by our 
group [26], the peak with m/z 3323 is likely to be dou- 
bly charged apolipoprotein C-I. We previously identified 
a 6.6 kDa peak as apolipoprotein C-I (molecular weight 
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Figure 1 Difference in protein expression of m/z 3323 and m/z 8938, detected with SELDI-TOF MS, between breast cancer cases and 
healthy controls. 
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Table 3 The Z-log-transformed intensities of the peaks 
detected with SELDI-TOF MS, ordered by their m/z 





Cases 


Controls 




Paired 




(n = 65) 


(n = 65) 




T test 


M/z 


Mean 


Mean 


Intensity in 


P- 




Z-log- 


Z-log- 


cases 


value 




trn nsfn rm pd 


trn nsfn rm pH 


rnntmls 

VO» LUMIII//J 






intensity (SD) 


intensity (SD) 






2958 


0.06 (0.95) 


-0.03 (1.02) 




.61 


3323 


0.21 (0.98) 


-0.19 (0.97) 


Higher 


.02 


3888 


0.12 (1.00) 


-0.13 (0.99) 




.18 


4649 


0.03 (0.95) 


-0.02 (1.00) 




.80 


4824 


-0.09 (0.85) 


0.13 (1.11) 




.21 


5343 


0.09 (0.87) 


-0.05 (1.08) 




.43 


5911 


0.00 (0.76) 


0.03 (1.16) 




.86 


6117 


0.03 (0.90) 


-0.01 (1.06) 




.81 


6439 


0.09 (0.99) 


-0.11 (1.01) 




.27 


6637 


0.11 (0.97) 


-0.10 (1.03) 




.23 


6842 


0.08 (1.01) 


-0.07 (1.00) 




.43 


6948 


-0.12 (1.01) 


0.14 (0.93) 




.15 


7476 


-0.04 (0.94) 


0.05 (1.06) 


_ 


.62 


7772 


0.06 (0.95) 


-0.08 (1.05) 




.45 


7978 


0.11 (0.97) 


-0.12 (1.02) 




.21 


8148 


0.11 (0.97) 


-0.12 (1.02) 




.21 


8609 


-0.04 (1.02) 


0.07 (0.97) 




.55 


8938 


0.19 (1.02) 


-0.14 (0.94) 


Higher 


.06 


9294 


0.03 (0.89) 


0.00 (1.03) 




.88 


9427 


0.15 (1.09) 


-0.17 (0.88) 




.08 


9501 


0.06 (0.87) 


-0.05 (1.06) 




.54 


13892 


-0.08 (0.97) 


0.12 (0.95) 




.26 


M/z: mass-to-charge ratio; SD: standard deviation; - 


: No statistically significant 



difference in intensity 

(MW): 6631 Da) by biomarker purification, in-gel tryptic 
digestion and peptide mapping. Its identity was con- 
firmed with an immunoassay. In the same study, a 
highly correlated 3.3 kDa peak was found to be the 
result of double charged apolipoprotein C-I ions [26]. 
Although these peaks were detected on different Pro- 
teinChip arrays (CM10 cation exchange surface), this 
protein may also bind to the IMAC30 Ni-metal-affinity 
surface. An extra argument is that besides m/z 3323, we 
also detected the peak representing apolipoprotein C-I 
itself in the current study (m/z 6637). Although its rela- 
tionship with early stage breast cancer was not statisti- 
cally significant (p = 0.23), the Z-log-transformed 
intensities of m/z 6637 and m/z 3323 detected in the 
current study were also correlated (Pearson R 2 = 0.558 
(p < 0.001) in the controls), as expected between a pro- 
tein and its doubly charged ion. 

The peak with m/z 8938 is likely to be C3a des-argi- 
nine anaphylatoxin (C3a desArg ) (MW: 8939 Da), based 
on a previous study by our group [27]. In that study a 
peak with m/z 8937 was identified as C3ad es Ar g by 



protein purification and in-gel tryptic digestion, followed 
by peptide mapping. The identity of the peak was con- 
firmed by sequencing the tryptic digest peptides by 
quadrupole-time-of-flight MS and by an immunoassay 
on ProteinA beads [27]. 

Proteins detected with 2D-nanoLC-MS/MS 

In total, 110 different proteins were detected in the sam- 
ples of the 20 cases-control pairs with 2D-nanoLC-MS/ 
MS. For only 32 of the detected proteins, ratios and EF's 
could be calculated for at least 14 of the 20 case-control 
pairs (Table 4). Afamin, apolipoprotein E and an iso- 
form of inter-alpha trypsin inhibitor heavy chain H4 
(ITIH4) were statistically significantly higher (p < 0.05) 
in cases than in controls (weighted mean ratio: 1.10 
(9596CI: 1.02-1.18), 1.13 (95%CI: 1.01-1.26) and 1.08 
(95%CI: 1.03-1.14), respectively). Alpha-2-macroglobulin 
and ceruloplasmin were statistically significantly lower 
(p < 0.05) in cases than in controls (weighted mean 
ratio: 0.94 (95%CI: 0.88-1.00) and 0.94 (95%CI: 0.89- 
0.99), respectively). After correction for multiple testing 
using the FDR, ITIH4 appeared to have less than 10% 
chance to be a false positive finding. 

Discussion 

We found several proteins that showed different intensi- 
ties in pre-diagnostic serum samples of breast cancer cases 
not yet showing clinical symptoms compared to samples 
of healthy controls. Two proteins detected with SELDI- 
TOF MS, one with m/z 3323, which is likely to be a dou- 
ble charged ion of apolipoprotein C-I, and another with 
m/z 8938, which is likely to be C3ad es Arg> were found to be 
related to pre-diagnostic breast cancer. Of the proteins 
detected with 2D-nanoLC-MS/MS, afamin, apolipoprotein 
E and an isoform of ITIH4 were slightly, but significantly 
higher and alpha-2-macroglobulin and ceruloplasmin 
slightly, but significantly lower in pre-diagnostic breast 
cancer samples compared to control samples. Although 
correction for multiple testing revealed that only ITIH4 
had less than 10% chance to be a false positive finding, 
several of the other proteins have previously been found in 
relation with symptomatic breast cancer. M/z 3323, which 
probably represents the double charged ion of apolipopro- 
tein C-I, showed the largest difference between cases and 
controls. Apolipoprotein C-I itself, detected both with 
SELDI-TOF MS (m/z 6637) and 2D-nanoLC-MS/MS, 
showed results in the same direction, i.e. higher in cases, 
but not statistically significantly. In a study by Engwegen 
et al. [26], examining serum samples taken after diagnosis, 
the doubly charged ion of apolipoprotein C-I was lower in 
breast cancer cases, but not statistically significantly. Apo- 
lipoprotein C-I itself (6631 Da), was statistically signifi- 
cantly lower in breast cancer cases in that study [26]. It is 
striking that the same protein was found to be related 
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Table 4 Proteins detected with 2D-nanoLC-MS/MS in 14 pairs or more 







Pairs* 


Weighted ratio + 


Random fixed 
effects model 


Protein Name 


Function 


n 


Mean 


95%CI 


p-value 


Vitronectin 




16 


1.06 


0.99-1.13 


.07 


Transthyretin 




15 


0.98 


0.88-1.09 


.71 


Alpha- 1 B-glycoprotein 




20 


1.03 


0.99-1.07 


.17 


Alpha-2-macroglobulin 


Proteinase inhibitor 


20 


0.94 


0.88-1.00 


.04 


Afamin 


Serum transport protein 


17 


1.10 


1.02-1.18 


.02 


AMBP protein 




18 


1.04 


0.97-1.12 


.26 


Apolipoprotein A-l 




20 


1.04 


0.97-1.12 


.25 


Apolipoprotein A-ll 




20 


0.98 


0.91-1.06 


.61 


Apolipoprotein A-IV 




20 


1.05 


0.95-1.18 


.32 


Apolipoprotein B-100 




20 


1.06 


0.99-1.12 


.08 


Complement C3 (Fragment) 




20 


1.02 


0.98-1.06 


.34 


Isoform 1 of Complement factor H 




18 


1.02 


0.97-1.07 


.39 


Ceruloplasmin 


Acute phase reactant 


20 


0.94 


0.89-0.99 


.03 


Hemopexin 




20 


0.96 


0.90-1.02 


.17 


Histidine-rich glycoprotein 




20 


0.96 


0.89-1.04 


.30 


Inter-alpha trypsin inhibitor heavy chain H1 




16 


1.00 


0.94-1.07 


.89 


Alpha-1 -acid glycoprotein 2 




16 


1.00 


0.93-1.08 


.91 


Inter-alpha (Globulin) inhibitor H2 




18 


0.96 


0.89-1.04 


.33 


Orosomucoid 1 




20 


1.06 


0.98-1.14 


.16 


Alpha-1 -antichymotrypsin 




19 


1.00 


0.92-1.08 


.94 


B-factor, properdin 




18 


0.99 


0.94-1.05 


.76 


Plasminogen 




18 


0.98 


0.90-1.07 


.69 


Alpha-2-HS-glycoprotein 




18 


1.00 


0.94-1.07 


.88 


Beta _ 2-glycoprotein 1 




18 


0.99 


0.92-1.05 


.66 


C4B1 




16 


0.99 


0.93-1.05 


.74 


Prothrombin (Fragment) 




16 


0.98 


0.88-1.10 


.76 


Apolipoprotein E 


Lipid metabolism 


16 


1.13 


1.01-1.26 


.04 


Apolipoprotein C-l 




14 


1.02 


0.94-1.12 


.57 


13 kDa protein 




14 


1.03 


0.92-1.16 


.55 


Isoform LMW of Kininogen-1 




18 


1.06 


0.98-1.14 


.13 


Isoform 1 of inter-alpha trypsin inhibitor heavy chain H4 


Acute phase reactant 


16 


1.08 


1.03-1.14 


<.01 


Vitamin D-binding protein 




20 


1.03 


0.99-1.07 


.17 



95%CI: 95% confidence interval; * Number of pairs in which a ratio could be determined and a EF could be calculated; + Ratio case/control 



with breast cancer in both studies, but in different direc- 
tions. This may be due to differences in sample collection, 
processing and storage, but also to the differences in stage 
of disease of the two study populations. We included sam- 
ples collected up to three years before diagnosis, while in 
the study by Engwegen et al. [26] samples were collected 
after diagnosis. Apolipoprotein C-I may be differently 
expressed in pre-diagnostic stages of breast cancer com- 
pared to stages visible on a mammogram and/or leading 
to clinical symptoms. It is also possible that the result is a 
chance finding. 

M/z 8938, probably representing C3a desArg , that we 
found to be higher in pre-diagnostic breast cancer 
samples, has been found to be related to breast cancer 
in several previous SELDI-TOF MS studies 



[2,3,6-8,28]. In the majority of these studies the pro- 
tein was higher in patients compared to controls 
[3,6-8], but in two studies it was lower [2,9]. ITIH4 
was higher in our pre-diagnostic breast cancer sam- 
ples than in the control samples. This is a protein of 
which fragments have been frequently described in 
relation to symptomatic and/or mammographically 
detectable breast cancer [6-9,29-31]. In these studies 
levels of a 4.3 kDa ITIH4 fragment were found either 
to be significantly higher [7,30], or significantly lower 
[6,8,9] in breast cancer. Levels of other fragments of 
ITIH4, which were investigated by Villanueva et al. 
[29], Song et al. [30], and our own group [31], were 
usually found to be higher in breast cancer or were 
not related at all [29,30]. 
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To our knowledge, afamin, apolipoprotein E, alpha-2- 
macroglobulin and ceruloplasmin have not been found 
before to differ between breast cancer serum samples 
and control serum samples in studies using SELDI-TOF 
MS or other profiling methods. In the 1980s however, 
the acute phase proteins alpha-2-macroglobulin and cer- 
uloplasmin were already studied in relation to breast 
cancer, using immunoassay methods [32,33]. Serum 
levels of alpha-2-macroglobulin did not differ between 
breast cancer patients and women with benign breast 
disease [32]. In our study, alpha-2-macroglobulin and 
ceruloplasmin were both lower in pre-diagnostic breast 
cancer samples compared to the control samples. 

It may be a limitation that we did not perform struc- 
tural identification, and validation of the discriminative 
power in an independent validation set, of the two dis- 
criminative proteins detected with SELDI-TOF MS. 
However, it is very likely that these proteins are acute 
phase reactants, which are not cancer specific, let alone 
breast cancer specific. Therefore, we decided not to 
invest in structural identification and validation. More- 
over, another similar study population was not available 
for validation. Nevertheless, it is very interesting that 
this kind of proteins is already discriminative up to 
three years before the diagnosis of breast cancer. There- 
fore, our results should not draw our attention to these 
specific proteins, and their potential as breast cancer 
biomarkers, but rather to the fact that an inflammatory 
process is already measurable up to three years before 
diagnosis, at a moment that only few tumor cells or a 
very small tumor may be present. 

The most important strength of our study is that we 
investigated proteomic profiles in serum of patients with 
asymptomatic breast cancer (diagnosed after a median 
time of 21.3 months (IQR: 0.7-26.6) after enrollment). 
Our study population therefore is more appropriate for 
finding early breast cancer biomarkers than all previous 
studies where mostly symptomatic cases were included. 
The case-control design nested in a cohort of, appar- 
ently healthy screening participants also ensures that all 
serum samples were collected, processed and stored uni- 
formly under strictly defined conditions, at a time when 
none of the participants were diagnosed with breast can- 
cer yet. These factors have shown to be important in 
protein profiling studies [10-16]. In this way systematic 
errors due to differences in these factors between cases 
and controls were prevented in our study. Moreover, we 
were able to control for many (possible) confounding 
variables, by including only post-menopausal women, 
who never had cancer before, were not diabetic, were 
not current smokers, and did not currently use oral con- 
traceptives or menopausal hormone therapy [20]. 
Furthermore, we could correct the results for age, BMI, 
past oral contraceptive and HT use, number of children, 



past smoking habits, alcohol intake, and several serum 
sample characteristics. 

A limitation of our study is that, due to the strict 
selection criteria and the limited availability of pre-diag- 
nostic serum samples of breast cancer cases, we were 
only able to include 68 case-control pairs in our study. 
Due to time and cost restriction, for the 2D-nanoLC- 
MS/MS analysis we only included the 20 cases that 
were diagnosed with breast cancer within the first 14 
months after enrollment in the study, and their matched 
controls. These samples sizes are limited, but the strict 
selection criteria also prevented bias and confounding. 

By measuring the protein profiles both with SELDI- 
TOF MS and 2D-nanoLC-MS/MS we benefited of the 
advantages of two complementary methods. SELDI-TOF 
MS has the advantage to simultaneously measure parts 
of the serum proteome in a high-throughput fashion 
with relative simple sample preparation, high analytical 
sensitivity and high speed of data acquisition [34,35]. 
Although with 2D-nanoLC-MS/MS fewer samples can 
be measured simultaneously, this method has the advan- 
tage that it can identify the detected proteins immedi- 
ately. Moreover, the protein detection by these two 
methods is complementary. With SELDI-TOF MS 
mainly measuring proteins in the 2 to 10 kDa mass 
range, many break-down products can be detected. 
Additionally, by measuring exact mass-to-charge ratios 
with SELDI-TOF MS, it is also possible to detect post- 
translational modified forms of proteins; for example 
proteins with additional amino acids or truncated forms. 
With 2D-nanoLC-MS/MS in combination with iTRAQ- 
labeling a higher selectivity is reached because of analy- 
sis of tryptic peptides with protein identification based 
on sequence information. This allows proteins with 
higher mass to be identified which cannot be detected 
with high sensitivity by SELDI-TOF MS. 

Conclusions 

We detected several serum proteins that differed in con- 
centration between women with asymptomatic breast 
cancer and matched healthy controls. For some of the 
proteins this may have been a chance finding, but C3a de _ 
sArg and ITIH4 have previously also been found in rela- 
tion with symptomatic breast cancer. Remarkably, high 
abundant, acute phase proteins, which we expected only 
to be detectable in symptomatic cancer cases, were also 
found to be significantly higher before diagnosis. Given 
that the currently identified proteins are high abundant, 
they are unlikely to be breast cancer specific, at least on 
their own. The fact however, that inflammatory pro- 
cesses are already present up to three years before diag- 
nosis needs to be further investigated. For the search for 
specific tumor markers, we should take into account 
that these are low abundant, as it is typical for known 
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circulating tumor markers to have low concentrations 
[36]. Using techniques that give insight into 'the deeper/ 
low abundant proteome', e.g. by fractionation of the 
samples or depletion of a higher number of the most 
abundant proteins, which was already partially done in 
the 2Dnano-LC-MS/MS analysis, may help to find these 
low abundant and probably more specific tumor 
markers. 

Additional material 



Additional file 1: SELDI-TOF MS data collection. Settings of the 
ProteinChip Reader, way of processing the spectra and settings for peak 
detection. 

Additional file 2: 2D-nanoLC-MS/MS analysis. Details about the used 
columns and mobile phases, and about the acquisition and calibration. 

Additional file 3: 2D-nanoLC-MS/MS data analysis. Details on search 
parameters for identification, and on data processing for quantification. 

Additional file 4: Characteristics of the subset. Characteristics of the 
subjects in the subset analyzed by 2D-nanoLC-MS/MS. 

Additional file 5: Characteristics of the serum samples in the subset. 

Characteristics of the serum samples in the subset analyzed by 2D- 
nanoLC-MS/MS. 
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